Add creation_mode for LARGEFILE #36

dustinvannoy-db · 2023-08-31T21:26:42Z

Add LARGEFILE creation mode option to improve performance when creating a Hyper file on Apache Spark for a large dataset. It does this by saving multiple files and using an ARRAY of Parquet file paths in the COPY command.
Note: This only works on Databricks using dbfs.

goodwillpunning · 2023-09-05T12:28:37Z

pyproject.toml

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"

 [project]
 name = "hyperleaup"
-version = "0.1.1"
+version = "0.1.2"
 authors = [
    { name="Will Girten", email="[email protected]" },


@dustinvannoy-db mind if we add you as an author here?

Sounds good to me.

goodwillpunning · 2023-09-05T17:21:38Z

LGTM! Thank you for the code contrib, @dustinvannoy-db!

dustinvannoy-db and others added 6 commits August 31, 2023 09:05

Add LARGEFILE to creation mode options

03bac57

Add largefile handling to creator

0a57597

Add test for largefile creation mode

4bc6227

Update version, README, and examples.

27da0b2

Fix demo notebook

11e4b3d

Fix README link

c90e892

goodwillpunning reviewed Sep 5, 2023

View reviewed changes

goodwillpunning approved these changes Sep 5, 2023

View reviewed changes

goodwillpunning merged commit bf2a4e6 into goodwillpunning:master Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add creation_mode for LARGEFILE #36

Add creation_mode for LARGEFILE #36

dustinvannoy-db commented Aug 31, 2023

goodwillpunning Sep 5, 2023

dustinvannoy-db Sep 5, 2023

goodwillpunning commented Sep 5, 2023

Add creation_mode for LARGEFILE #36

Add creation_mode for LARGEFILE #36

Conversation

dustinvannoy-db commented Aug 31, 2023

goodwillpunning Sep 5, 2023

Choose a reason for hiding this comment

dustinvannoy-db Sep 5, 2023

Choose a reason for hiding this comment

goodwillpunning commented Sep 5, 2023